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'^ ' Abstract 

o ■ 

(^ • A celebrated 1976 theorem of Aumann asserts that honest, rational Bayesian agents with common 

^Sl ' priors will never "agree to disagree": if their opinions about any topic are common knowledge, then 

those opinions must be equal. Economists have written numerous papers examining the assumptions 
r^ ' behind this theorem. But two key questions went unaddressed: first, can the agents reach agreement 

after a conversation of reasonable length? Second, can the computations needed for that conversation 
be performed efficiently? This paper answers both questions in the affirmative, thereby strengthening 
f^ ' Aumann 's original conclusion. 

, We first show that, for two agents with a common prior to agree within e about the expectation 

I !■ of a [0,1] variable with high probability over their prior, it suffices for them to exchange order 1/e^ 

\^ ' bits. This bound is completely independent of the number of bits n of relevant knowledge that the 

r) , agents have. We then extend the bound to three or more agents; and we give an example where the 

• ' economists' "standard protocol" (which consists of repeatedly announcing one's current expectation) 

fj I nearly saturates the bound, while a new "attenuated protocol" does better. Finally, we give a protocol 

I I. that would cause two Bayesians to agree within e after exchanging order 1/e^ messages, and that can be 

' simulated by agents with limited computational resources. By this we mean that, after examining the 

" ' I agents' knowledge and a transcript of their conversation, no one would be able to distinguish the agents 

, ■ from perfect Bayesians. The time used by the simulation procedure is exponential in 1/e^ but not in n. 

^ ■ 1 Introduction 

o: 

^^ . A vast body of work in AI, economics, philosophy, and other fields seeks to model human beings as Bayesian 

^D ' agents — agents that start out with some prior probability distribution over possible states of the world, then 

C/5 . update the distribution as they gather new information ^21- Because of its simplicity, the "humans-as- 

J-^ ' roughly-Bayesians" thesis has remained popular, despite the work of AUais ^, Tversky and Kahneman |19| . 

^ , and others; and despite well-known problems such as old evidence 7 . But one aspect of human experience 

seems especially hard to reconcile with the thesis. 

Pick any two people, and there will be some topic they disagree about: capitalism versus socialism, the 

C^ ' Israeli-Palestinian conflict, the interpretation of quantum mechanics, etc.^ The more intelligent the people, 

the easier it will be to find such a topic. If they discuss the topic, chances are excellent that they will not 

reach agreement, but will instead become more confirmed in their previous beliefs. This is so even if the 

people respect each other's intelligence and honesty. 

The above facts are known to everyone, yet as Aumann |2] observed in 1976, they constitute a serious 
challenge to Bayesian accounts of human reasoning. For suppose Alice and Bob are Bayesians, who have 
the same prior probabilities for all states of the world, but who have since gained different knowledge and 
thus have different posterior probabilities. Suppose further that, conditioned on everything she knows, 
Alice assigns a posterior probability p to (say) extraterrestrial life existing. Bob likewise assigns a posterior 
probability q. Then provided both agents know p and q (and know that they know them, etc.), Aumann 
showed that p and q must be equal. This is true even if neither agent has any idea on what sort of evidence 
the other's estimate is based. For the sort of evidence can itself be considered a random variable, which is 
ultimately governed by a prior probability distribution that is the same for both agents. 

Admittedly, the agents are unlikely to agree immediately after exchanging p and q. For conditioned on 
Alice's estimate being p, Bob will revise his estimate q, and similarly Alice will revise p conditioned on Bob's 
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estimate q. The agents will then have to exchange their new estimates p' and q' , and so on iteratively. 
But provided the set of possible states is finite, it is easy to show that this iterative process must terminate 
eventually, with both agents having the same estimate [5]. In conclusion, then, there is no reason for the 
agents ever to disagree about anything! 

On hearing this theorem for the first time, all of us come up with plausible ways in which actual human 
beings might evade its conditions. People have self-serving biases; they often discard or distort evidence that 
goes against what they want to beheve 0. (According to an often-cited study [Fj, 94% of professors consider 
themselves better than their average colleagues.^) People might interpret the same assertion differently. 
Or the assertion might be inherently ambiguous, if it deals with beauty or morality for instance. People 
might weigh the same evidence by different criteria. They might not understand the evidence. They might 
defend their opinions as high-school debaters do, out of sport rather than a desire for truth. They might 
not report their opinions with candor; or if they do, they might not trust others to do likewise. 

In our view, the real challenge is not to list such caveats, but to sift through them and to discover 
which ones are fundamental. As an illustration, several of the caveats listed above disappear once we 
assume that all people have a common prior. For among other things, such a prior would assign common 
probabilities to all possible ways of parsing an ambiguous sentence, and to all possible ways of weighing 
evidence. Understandably, then, much of the criticism of Aumann's theorem has focused on the common 
prior assumption (see [301^1 for a discussion of that assumption). 

But suppose we accept that two people have different priors. The obvious question is, what caused their 
priors to differ? Different career choices? Different friends? Different kindergarten teachers? Whatever 
is named as the first influence, we need merely go back in time to before that influence took effect. At the 
earlier time, the two people had the same prior by assumption. So at later times, they would not really 
have different priors, just different posteriors obtained by starting from the same prior and then conditioning 
on different life experiences. If we push this reasoning to its limit, as Cowen and Hanson 0] do, we are 
left wondering whether prior differences could be encoded in DNA at conception. Even then, how much 
confidence should you place in an opinion, if you know that were your genes different, you would have the 
opposite opinion? More generally, on what grounds can you favor your own prior over another's? For all 
you know, your prior was "switched by accident" with someone else's at birth! 

After staring into the metaphysical abyss of prior differences, the natural reaction of a computer scientist 
is to step back, and ask if there is some simpler explanation for why Aumann's theorem fails to describe 
the real world. Recall that in the theorem, Alice's and Bob's opinions only became equal by the end of a 
hypothetical conversation. Might that conversation last an absurdly long time? After all, if Alice and Bob 
exchanged everything they knew, then clearly they would agree about everything! But presumably they are 
not Siamese twins, and do not have their entire lives to talk to each other. Thus communication complexity 
might provide a fundamental reason for why even honest, rational people could agree to disagree. Indeed, 
this was our conjecture when we began studying the topic. 

Computational complexity provides a second promising reason. If a "state of the world" consists of n 
bits, then Aumann's theorem requires Alice and Bob to represent a prior probability distribution over 2" 
possible states. Even worse, it requires them to calculate expectations over that distribution, and update it 
conditioned on new information. If n is (say) 10000, then this is obviously too much to ask. 

1.1 Summary of Results 

This paper initiates the study of the communication complexity and computational complexity of agreement 
protocols. Its surprising conclusion is that complexity is not a major barrier to agreement — at least, not 
nearly as major as it seems from the above arguments. In our view, this conclusion strengthens Aumann's 
original theorem substantially, by forcing our attention back to the origin of prior differences. 

For economists, the main novelty of the paper will be our relentless use of asymptotic analysis. We will 
never be satisfied to show that a protocol terminates eventually. Instead we will always ask: do the resources 
needed for the protocol scale 'reasonably' with the parameters of the problem being solved? Here 'resources' 
include the number of messages, the number of bits per message, and the number of computational steps; 
while 'parameters' include the number of agents, the number of bits each agent is given, and the desired 
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accuracy and probability of success. This approach will let us model the limitations of real- world agents 
without sacrificing simplicity and elegance. 

For computer scientists, the main novelty will be that, when we analyze the communication complexity 
of a function /, we care only about how long it takes some set of agents to agree among themselves about 
the expectation of /. Whether the agents' expectations agree with external reality is irrelevant. 

After introducing notation in Section|21 in Section|21we present our first set of results, which concern the 
communication complexity of agreement. 

Section ITTI studies the "economists' standard protocol," introduced by Geanakoplos and Polemarchakis 
|S] and alluded to earlier. In that protocol, Alice and Bob repeatedly announce their current expectations 
of a [0, 1] random variable, conditioned on all previous announcements. The question we ask is how many 
messages are needed before the agents' expectations agree within e with probability at least 1 — 6 over their 
prior, given parameters e and 6. We show that order 1/ {Se"^) messages suffice. We then show that order 
1/ {Se'^) messages still suffice, if instead of sending their whole expectations (which are real numbers), the 
agents send "summary" messages consisting of only 2 bits each. What makes these upper bounds surprising 
is that they are completely independent of n, the number of bits needed to represent the agents' knowledge. 
By contrast, in ordinary communication complexity (see |14)'). it is easy to show that given a random function 
/ : {0, 1}" X {0, 1}" -^ [0, 1], Alice and Bob would need to exchange order n bits to approximate / to within 
(say) 1/10 with high probability. 

Given the results of Section ITTl several questions demand our attention. Is the upper bound of 1/ {6e^) 
bits tight, or can it be improved even further? Also, is the economists' standard protocol always optimal, 
or do other protocols sometimes need even less communication? Section 13.21 addresses these questions. 
Though we are unable to show any lower bound better than log 1/e that applies to all protocols, we do give 
examples where the standard protocol needs almost 1/e^ bits. We also show that the standard protocol 
is not optimal: there exist cases where the standard protocol uses almost 1/e^ bits, while a new protocol 
(which we call the attenuated protocol) uses fewer bits. 

In earlier work, Parikh and Krasucki |15| extended Aumann's agreement theorem to three or more 
agents, who send messages along the edges of a directed graph. Thus, it is natural to ask whether our 
efficient agreement theorem extends to this setting as well. Section IX^ shows that it does: given TV agents 
with a common prior, who send messages along a strongly connected graph of diameter d, order Nd"^ / (^e^) 
messages suffice for every pair of agents to agree within e about the expectation of a [0, 1] random variable 
with probability at least 1 — 5 over their prior. 

In Section01we shift attention to the computational complexity of agreement, the subject of our technically 
most interesting result. What we want to show is that, even if two agents are computationally bounded, 
after a conversation of reasonable length they can still probably approximately agree about the expectation 
of a [0, 1] random variable. A large part of the problem is to say what this even means. After all, if 
the agents both ignored their evidence and estimated (say) 1/2, then they would agree before exchanging a 
single message! So agreement is only interesting if the agents have made some sort of "good-faith effort" to 
emulate Bayesian rationality. 

Although we leave unspecified exactly what effort is necessary, we do propose a criterion that we think 
is certainly sufficient. This is that the agents be able to simulate a Bayesian agreement protocol, in such 
a way that a computationally-unbounded referee, given the agents' knowledge together with a transcript of 
their conversation, be unable to decide (with non-negligible bias) whether the agents are computationally 
bounded or not. The justification for this criterion is that, just as Turing |18j argued that a perfect simulation 
of thinking is thinking, so it seems to us that a statistically perfect simulation of Bayesian rationality is 
Bayesian rationality. 

But what do we mean by computationally-bounded agents? We discuss this question in detail in Section 
01 but the basic point is that we assume two "subroutines": one that computes the [0, 1] variable of interest, 
given a state of the world w; and another that samples a state lo from any set in either agent's initial 
knowledge partition. The complexity of the simulation procedure is then expressed in terms of the number 
of calls to these subroutines. 

Unfortunately, there is no way to simulate the economists' standard protocol — even our discretized version 
of it — using a small number of subroutine calls. The reason is that Alice's ideal estimate p might lie on a 
"knife-edge" between the set of estimates that would cause her to send message mi to Bob, and the set that 
would cause her to send a different message m2. In that case, it does not suffice for her to approximate p 



using random sampling; she needs to determine it exactly. Our solution, which we develop in Section |4. 11 
is to have the agents "smooth" their messages by adding random noise to them. By hiding small errors 
in the agents' estimates, such noise makes the knife-edge problem disappear. On the other hand, we show 
that in the computationally-unbounded case, the noise does not prevent the agents from agreeing within e 
with probability \ — 5 after order 1/ (^e^) messages. In Sections 14 . 21 and |4 . 31 we prove the main result: that 
the smoothed standard protocol can be simulated using a number of subroutine calls that depends only on e 
and 5, not on n. The dependence, alas, is exponential in 1/ (^'^e^j, so our simulation procedure is still not 
practical. However, we expect that both the procedure and its analysis can be considerably improved. 

We conclude in Section [S] with some suggestions for future research, and some speculations about the 
causes of disagreement. 

2 Preliminaries 

Let fi be a set of possible states of the world. Throughout this paper, SI will be finite — both for simplicity 
of presentation, and because we do not believe that any physically realistic agent can ever have more than 
finitely many possible experiences. Let 2? be a prior probability distribution over Vi that is shared by some 
set of agents. We can assume T) assigns nonzero probability to every w G il, for if not, we simply remove 
the probability-0 states from H. Whenever we talk about a probability or expectation over a subset S of fi, 
unless otherwise indicated we mean that we start from T) and conditionalize on a; € 5*. 

Throughout this paper, we will consider protocols in which agents send messages to each other in some 
order. Let Vtn [lo) be the set of states that agent i considers possible immediately after the t*'* message 
has been sent, given that the true state of the world is lo.'^ Then uj G flt^t i'^) Q ^, and indeed the 
set {fli^t (<^)}ijcn forms a partition of Q. Furthermore, since the agents never forget messages, we have 
^i,t {^) C fli^t~i {^)- Thus we say that the partition {^i,t}^po refines {D,i_t^i}^^^, or equivalently that 
{tli,t_i}^gj^ coarsens {^i,t}^go- (As a convention, we freely omit arguments of w when doing so will cause 
no confusion.) Notice also that if the t*'' message is not sent to agent i, then fli^t (w) = ^i,t-i (^)- 

Now let / : SI ^ [0, 1] be a real-valued function that the agents are interested in estimating. The 
assumption / (w) G [0, 1] is without loss of generality — for since H. is finite, any function from O to R has 
a bounded range, which we can take to be [0, 1] by rescaling. We can think of / (w) as the probability 
of some future event conditioned on uj, but this is not necessary. Let Ei^t (cj) = ^'^uj'enitioj) if i'^')] be 
agent i's expectation of / at step t, given that the true state of the world is uj. Also, let Qi^t {<jj) = 
{uj' : Ei^t (w') = Ei^t (w)} be the set of states for which agent i's expectation of / equals Ei,t (uj). Then the 
partition {'di.tj^^Q coarsens {^i^tj^^^-^, and Ei^t {^) = EX^^/ge^.tCt^) [f i^')]- 

The following simple but important fact is due to Hanson |11|. 

Proposition 1 ([llj) Suppose the partition {^i.t}^^^^ ^g/^'^cs {®i,"}(jen- Then 

EX [E,^t i^')] = EX [E,,t (cj')] = ^j,n i^) 

for all UJ €z SI. As a consequence, an agent's expectation of its future expectation of f always equals its 
current expectation. As another consequence, if Alice has just communicated her expectation of f to Bob, 
then Alice's expectation of Bob's expectation of f equals Alice's expectation. 

Proof. In each case, we are taking the expectation of / over a subset 5 C SI (either S7j „ (uj) or Oj „ (uj)) 
for which EX^i^s [f ('^Ol = ^j,u ('^)- How S is "sliced up" has no effect on the result. ■ 

Proposition ^ already demonstrates a dramatic difference between Bayesian agreement protocols and 
actual human conversations. Suppose Alice and Bob are discussing whether useful quantum computers will 
be built by the year 2050. Bob says that, in his opinion, the chance of this happening is only 5%. Alice 
says she disagrees: she thinks the chance is 90%. How much should Alice expect her reply to influence 
Bob's estimate? Should she expect him to raise it to 10%, or even 15%, out of deference to his friend Alice's 
judgment? According to Proposition ^ she should expect him to raise it to 90%! That is, depending on 
what else Bob knows, his new estimate might be 85% or 95%, but its expectation from Alice's point of view 
is 90%. 



^We assume for now that messages are "noise-free"; that is, they partition the state space sharply. Later we will remove 
this assumption. 
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Figure 1: After Alice tells Bob whether Ea,o is 1 or 0, Bob's partition {^s.ol^p^ is refined to {^B,i}^pQ- 

2.1 Miscellany 

Asymptotic notation is standard: F (n) — O {G (n)) means there exist positive constants a,b such that 
F{n) < a + bG (n) for all n > 0; F {n) ^ n{G (n)) means the same but with F (n) > a + bG (n); F [n) = 
e (G (n)) means F {n) ^ O (G (n)) and F {n) ^ VL (G (n)); and F (n) = o (G (n)) means F (n) = O (G (n)) 
and not F [n) = f](G(n)). 

We will have several occasions to use the following well-known bound. 

Theorem 2 (Chernoff, Hoeffding) Let xi,. . . ,xk be K independent samples of a [0, 1] random variable 
with mean jjl. Then for all a & (Oil); 

Pr [xi + • • • + XK < (1 - a) ^iK] < e-A-a'-^/a^ 
Pr [\xi + ■■■ + XK-^J-K\>aK]< 26'^"^^. 

3 Communication Complexity 

We now introduce and justify the communication complexity model. Assume for the moment that there are 
two agents, Alice (A) and Bob {B); Section |^1 will generalize the model to three or more agents. We can 
imagine if we like that Alice and Bob are given n-bit strings x and y respectively, so that 17 C {0, 1}" x {0, 1}" . 
Letting w = {x, y), we then have ilA,o {^) C a; x {0, 1}" and flB,o {^) C {0, 1}" x y. 

In an agreement protocol, Alice and Bob take turns sending messages to each other. Any such protocol 
is characterized by a sequence of functions mi, m2, ■ ■ ■ : 2^^ — > A4, known to both agents, which map subsets 
of n to elements of a message space A4. Possibilities for A4 include [0, 1] in a continuous protocol, or {0, 1} 
in a discretized protocol. In all protocols considered in this paper, the mt's will be extremely simple; for 
example, we might have rut (5) = EX^/gg [/ {uj')] be the agent's current expectation of /. 

The protocol proceeds as follows: first Alice computes mi (f^^^o (w)) and sends it to Bob. After seeing 
Alice's message, and assuming the true state of the world is uj, Bob's new set of possible states becomes 

Q.B,i {(^) = ^B,o (w) n {uj' : mi {il.A,o i^')) = ™i (^A,o (w))} 

as in Figure n Then Bob computes m2 {^b.i {^)) and sends it to Alice, whereupon Alice's set of possible 
states becomes 

^A.2 (w) = ^A,0 (w) n {uj' : m2 i^BS (w')) = 1^2 i^B.l i^))} ■ 

Then Alice computes m^ {flA,2 (w)) and sends it to Bob, and so on. 

At this point we should address an obvious question: how do Alice and Bob know each other's initial 
partitions, {^a,o}i^^q and {^B.oli^gn^ ^^ ^^^ agents do not know each other's partitions, then messages 
between them are useless, since neither agent knows how to update its own partition based on the other's 
messages. This question is not specific to our setting; it can be asked about Aumann's original result as 
well as any of its extensions. The solution in each case is that the state of the world lj G fi includes the 
agents ' mental states as part of it. From this it follows that every agent has a uniquely defined partition 
known to every other agent. For suppose Alice calculates that if the state of the world is uj, then Bob's 
knowledge is ^b,o {^), meaning that he knows (and knows only) that the state belongs to ^b,o (<^)- Then 
for all uj' € ^B,o (w), she must calculate that if the state is uj' , then Bob's knowledge is i^B,o (w) as well. 
Otherwise one of her calculations was mistaken. 



The reader might object on the foUowing grounds. Suppose Ahce and Bob are the only two agents, and 
let r2^°^ be the set of possible states of the "external" world — meaning everything except Alice and Bob. 
Next let fl^^^ be the set of possible states of the agents' knowledge regarding i}^^\ let ri^^^ be the set of 
possible states of their knowledge regarding fi^^^, and so on. Then fl — l^^"' x il^^^ x $7^^^ x • • • , which 
contradicts the assumption that H is finite. The obvious response is that, since the agents' brains can store 
only finitely many bits, not all elements of fi*^"-' x fl'^^^ x 51^^^ x • • • are actually possible. 

However, the above response is open to a different objection, related to the diagonalization arguments 
of Godel and Turing. Suppose Alice's and Bob's brains store n bits each. Then in order to reason about 
the set of possible states of their brains, wouldn't they need brains that store more than n bits? We leave 
this conundrum unresolved, confining ourselves to the following three remarks. First, only a tiny portion 
of the agents' brains is likely to be relevant to their topic of conversation, which means "plenty of room 
left over" for metareasoning about knowledge. Second, by reducing the number of brain states that the 
agents need to consider, our results in Section 0] will lessen the force of the self-reference argument, though 
not eliminate it. Third, the agents' "knowledge hierarchy" seems likely to collapse at a low level. That 
is, Alice might have little idea what sort of evidence shaped Bob's opinions about the external world. But 
Bob probably has some idea what sort of evidence shaped Alice's opinions about Bob's opinions, and Alice 
probably has a good idea what sort of evidence shaped Bob's opinions about Alice's opinions about Bob's 
opinions (assuming Bob even has nontrivial such opinions). The more indirect the knowledge, the fewer the 
ways of obtaining it. 

Let us return to explaining the communication complexity model. After the t*'' message, we say Alice 
and Bob (e, S)-agree if their expectations of / agree to within e with probability at least 1 ~ S; that is, if 

Pr [|^A,iH-^s,tM| >e]<6. 

The goal will be to minimize the number of messages until the agents (e, ^)-agree. 

In our view, (£,(5) -agreement is a much more fundamental notion than exact agreement. For suppose 
/ represents the probability that global warming, if left unchecked, will cause sea levels to rise at least 30 
centimeters by the year 2100. If after an hour's conversation, any two people could agree within 1/4 about 
/ with probability at least 3/4, then the world would be a remarkably different place than it now is. That 
the agreement was inexact and uncertain would be less significant than the fact that it occurred at all. 

But why do we calculate the success probability over V, and not some other distribution? In other 
words, what if the agents' priors agree with each other, but not with external reality? Unfortunately, in 
that case it seems difficult to prove anything, since the "true" prior could be concentrated on a few states 
that the agents consider vanishingly unlikely. Furthermore, we conjecture that there exist /, V such that 
for all agreement protocols, the agents must exchange Q (n) bits to agree within e on every state to (that is, 
to (e, 0)-agree). So given a protocol that causes Alice and Bob to (e, S)-agiee, what we should really say is 
that both agents enter the conversation expecting to agree within e with probability at least 1 — 6. This, 
of course, is profoundly unlike the situation in real life, where adversaries generally do not enter arguments 
expecting to convince or to be convinced. 

Let us make two further remarks about the model. First, if the agents want to agree exactly (that is, 
(0, 0)-agree), it is clear that in the worst case they need 2n bits of communication, n from Alice and n from 
Bob. Note the contrast with ordinary communication complexity, where n bits always suffice. Indeed, 
even to produce approximate agreement, two-way communication is necessary in general, as shown by the 
example / (cc, y) = {2x + y) /3, where x,y d {0, 1} are uniformly distributed. 

Second, our ending condition is simply that the agents (£,(5)-agree at some step t. We do not require 
them to fix this t independently of / and V. The reason is that for any i, there might exist perverse /, V 
such that the agents nearly agree for the first i — 1 steps, then disagree violently at the i*'' step. However, 
it seems unfair to penalize the agents in such cases. 

The following is the best lower bound we are able to show on agreement complexity. 

Proposition 3 There exist /, I? such that for all e > 2^" and S > 0, Alice must send ft (log i^) bits to 
Bob and Bob must send 51 (log -^—) bits to Alice before the agents (e, S)-agree. In particular, if S is bounded 
away from 1 by a constant, then fl{\ogl/e) bits are needed. 

Proof. Let 17 = {1, ... , 2"}^ let V be uniform over n, and let / (x, y) = {x + y) /2"+i for aU {x, y) e n. 
Thus if X is Bob's expectation of x at step t and y is Alice's expectation of y, then Ea^i — {^ + y) /2"+^ 



1-5^ 



Suppose one agent, say Alice, has sent only t < logj ( 



2 bits to Bob. For 
Conditioned on i, 



and^B,t = (x + y)/2" 

each i G {1, . . . , 2*}, let p^ be the probability of the i*^ message sequence from Alice 

there are 2"pi values of x still possible from Bob's point of view. So regardless of EB,t, the probability of 

\EA,t — EB,t\ < £ can be at most Ae/pi- Therefore the agents agree within e with total probability at most 



i:„g)=4rf<i-. 



3.1 Convergence of the Standard Protocol 

The two-player "standard protocol" is simply the following: first Alice sends Ea^, her current expectation 
of /, to Bob. Then Bob sends his expectation Eb.i to Alice, then Alice sends Ea,2 to Bob, and so on. 
Geanakoplos and Polemarchakis [H| observed that for any /, 2?, if the agents use the standard protocol then 
after a finite number of messages T, they will reach consensus — meaning that Ea,t = Eb,t, both agents 
know this, both know that they know it, etc. In particular, in our terminology Alice and Bob (0,0)-agree. 
In this section we ask how many messages are needed before the agents (e, (5)-agree. The surprising and 
unexpected answer, in Theorem[Sl is that 1/ (5s^) messages always suffice, independently of n and all other 
parameters of / and V. One might guess that, since the expectations Ea.o, Eb^i, ■ ■ ■ are real numbers, the 
cost of communication must be hidden in the length of the messages. However, in Theorem we show 
even if the agents send only 2-bit "summaries" of their expectations, O (1/ (fe^)) messages still suffice for 
(e, 5)-agreement. 



Given any function F : Q - 
again and again in this paper. 



[0,1], let \\F\\'^ = EX^ev F (lj) 



The following proposition will be used 



Proposition 4 Suppose the partition {^i^t} po refines {Qj,u} 



iLuefi' 



Then 



\E, 



■i,t\\2 



\Fj,u\\2 - \\EiA. - Ej^u\\2 



SO in particular, \\Ei^t\\2 ^ Il-^j,ji|i2 



A special case is that \\Ei^t+i\\-^ > H-E-i.flL for all i,t. 



Proof. We have 



^^[EijEj_u] — EX 



Luev 



Ej,u (w) EX [E,^t [uj')] 






by Proposition m and therefore 



\Ei_t — E. 



J," II 2 



\E^.t\ 



\E. 



J:«ll2 



2EX[E,^tE,,,] ^ \\E,, 



\E. 



J,"ll2 



We can now prove an upper bound on the number of messages needed for agreement. 

Theorem 5 For all /, P, the standard protocol causes Alice and Bob to {e,S)-agree after at most 1/ (Se^) 
messages. 



Proof. Intuitively, so long as the agents disagree by more than e with high probability, Alice's expectation 
Ea,i, Ea^2i ■ ■ ■ follows an unbiased random walk with step size roughly e. Furthermore, this walk has two 
absorbing barriers at and 1, for the simple fact that EA,t G [0; !]■ And we expect a random walk with 
step size e to hit a barrier after about 1/e^ steps. 

To make this intuition precise, we need only track the expectation, not of Ea and Eb, but of E'j^ and 



Eg. Suppose Alice sends the t* message. Then Bob's partition {^B,t}^pfi 
by Proposition^ that 



refines {QA,t-i}, 



wGf2' 



It follows 



I-E-B.Jt ^ ll^'A.t-lIU — \\FB,t — EA,t-l\\-^ 



Assuming Pr [\Eba — EA,t-i\ > £] > S, this implies that ||i?s,t||2 > ||i?A,t-i|l2 + <^£^- Similarly, after Bob 
sends Alice the (i + 1)" message, we have ||£'^^f+i||2 > ||-E_B,t|l2 + Se^. So until the agents (e, 5)-agree, each 
message increases max < \\EA.t\\2 i Il-E'-B,i|l2 r ^y niore than fe^. But the maximum can never exceed 1 (since 

EaatEba G [0, 1]), which yields an upper bound of 1/ (fc^) on the number of messages. ■ 

As mentioned previously, the trouble with the standard protocol is that sending one's expectation might 
require too many bits. A simple way to discretize the protocol is as follows. Imagine a "monkey in the 
middle," Charlie, who has the same prior distribution T) as Alice and Bob and who sees all messages between 
them, but who does not know either of their inputs. In other words, letting ^c,t (w) be the set of states 
that Charlie considers possible after the first t messages, we have ^cfi {'^) = ^ for all lu. Then the partition 
{fic,t}ijgf2 coarsens both {^A,t]^fz^i and {Q,B,t]^^^^\ therefore both Alice and Bob can compute Charlie's 
expectation Ec,t (w) = EX^,gOp^(^) [/ {u')] of /. 

Now whenever it is her turn to send a message to Bob, Alice sends the message "high" if Ea^i > £'C,t+£/4, 
"low" if EA,t < Ec\t — e/4, and "medium" otherwise. This requires 2 bits. Likewise, Bob sends "high" if 
Es.t > Ec.t + £/4, "low" if Es.t < Ec,t ~ ff/4, and "medium" otherwise. 



Theorem 6 For all /, I?, the discretized protocol described above causes Alice and Bob to (s, 5) -agree after 
O (1/ (<5e'^)) messages. 



2 . 



Proof. The plan is to show that either |l_B^_t||2, |li?B,f|l2, or ||£'c,t||2 increases by at least & /512 with 
every message of Alice's, until Alice and Bob (e, ^)-agree. Since ||i5i j|l2 < 1 for all i, this will imply an 
upper bound of 3072/ (fe^) on the number of messages (we did not optimize the constant!). 

Assume that Pr [\EA,t — EB,t\ > s] > ^ and it is Alice's turn to send the {t + lY message. By the triangle 
inequality, cither 



or 



Pr 



Pr 



\EA.t — Ec,t\ > 



\EB,t~Ec,t\>^ 



> 



5 
> -. 
- 2 



We analyze these two cases separately. In the first case, with probability at least 6/2 Alice's message is 
either "high" or "low." If the message is "high," then Ec,t+i becomes an average of numbers each greater 
than EcA + £/4, so Eca+i > Eqa + e/4. If the message is "low," then likewise Ec,t+i < Eqa — e/4. Since 
{f2c,t+i}^ef7 refines {Vlct}^^^, Proposition |3| thereby gives 

5 /e\2 



||i?c,*+i|l2 - WEcAl = \\Ec,t+i - EcAl > 2 (i) 



Now for the second case. If, after Alice sends the {t + lY message, we still have 



Pr 



\Eb, 



Ec,t+i\ > 



> 



then the previous argument applied to Bob implies that 



\EcA+2\\2 - \\Ec\t+l\\2 > ^ 



(i)^ 



and we are done. So suppose otherwise. Then the difference between Bob's and Charlie's expectations 
must have changed significantly: 



Pr 



\Eba — Ec,t\ — \Eba+i — EcA+i\ > T 



> 



Hence by another application of the triangle inequality, either 



\Ebj+i — EB,t\ > n 



S 

>8 



or 

Pr '" ^ . - 



\Ec,t+i — Ec,t\ > o 



>8- 



In the former case, Proposition 01 yields 

\\Ebj.+i\\2 - \\EB,t\\2 = ll^-B,«+l ^ EB.t\\2 > g (^gj ■ 

while in the latter case, 

I|ii;c,+i||^-||ii;c,ll2>^(|)'- 



3.2 Attenuated Protocol 

We have seen that two agents, using the standard protocol, will always (e, (5)-agree after exchanging only 
O (1/ (<5e^)) messages. This result immediately raises three questions: 

(1) Is there a scenario where the standard protocol needs about 1/e^ messages to produce (e, 5)-agreement? 

(2) Is the standard protocol always optimal, or do other protocols sometimes outperform it? 

(3) Is there a scenario where any agreement protocol needs a number of communication bits polynomial 
in 1/e? 

Although we leave question (3) open, in this section we resolve questions (1) and (2). In particular, 
assume for simplicity that S = 1/2. Then for all e > 0, Theorem [T] gives a scenario where the standard 
protocol uses almost 1/e^ messages, even if the messages are continuous rather discrete. By contrast, a new 
"attenuated protocol" uses only 2 messages, both consisting of a constant number of bits (independent of e). 
Theorem IHl then gives a fixed scenario where for all e > 0, the standard protocol uses almost 1/e^ messages, 
while the attenuated protocol uses only 2 messages, both consisting of O (1/e) bits. 

The attenuated protocol is interesting in its own right. The idea is to imagine that in the standard 
protocol, the communication channel between Alice and Bob becomes gradually more noisy as time goes on, 
so that each message conveys slightly less information than the one before. It turns out that in some cases, 
such noise would actually help! For intuitively, each time the message intensity decreases by e, the "price" 
the agents pay in terms of disagreement is proportional to e^. So it is better for them to attenuate their 
conversation gradually, than to send a sequence of "maximum-intensity" messages followed by no message 
(which we can think of as intensity 0).* Even if the noise that produces this strange effect is missing from 
the channel, the agents can easily simulate it. Furthermore, the messages will turn out to be nonadaptive, 
so they can all be concatenated into one message from Alice and one from Bob. 

But how do we ensure that the standard protocol needs almost 1/e^ messages? Intuitively, by forcing the 
random walk behavior of Section f3.1l actuallv to occur. That is, at the beginning there will be a disagreement 
that can only be resolved by Alice sending a bit to Bob. But then that bit will cause a new disagreement 
even as it resolves the old one, and so on. 

Theorem 7 For all e > 0, there exist /, I? such that for all S > 0: 

(i) Using the standard protocol, Alice and Boh need to exchange 17 -j-j — ^ a ) messages before they 

\^ log (l-S)E J 

(e, S)-agree. 
(a) Using a different protocol, they need only exchange 2 messages, both consisting of O (log 1/6) bits. 

In particular, if S ^ 1/2 then the standard protocol needs il I ^^ '^ , ) bits whereas the attenuated protocol 
needs O (1) bits. 

■'The same phenomenon occurs in the "Zeno effect" of quantum mechanics . 
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Figure 2: Alice's expectation EA,t (solid line), and Bob's expectation Es^t (dashed line), as a function oft 



Proof. Let 



n = 



64e2 In 



jT^by^ 



(throughout we omit floor and ceiling signs for convenience). The state space J7 consists of all pairs {x,y), 
where x — xi . . . Xn and y = yi ■ ■ .yn belong to {—1, 1} . The prior distribution V is uniform over Q. Let 



1 " 

F{x,y) = - + 2e^ {yi-iXi + Xiy,) 



{x, 


y) 


a F{x 


y) 


e[o,i] 







if fIx 


y) 


<o 


1 




if F{x 


y) 


>i 



where yo = 1- Then the function that interests the agents is 



I{x,y) 



For simplicity, we first consider F (which need not be bounded in [0, 1]), and later analyze the "edge effects" 
that arise in switching to /. We claim that, if the agents use the continuous standard protocol to evaluate 
F, then \EA,t — ^B,t| = 2e at all steps t < 2n, where EA,t and EB,t are Alice's and Bob's expectations of F 
respectively after t messages have been exchanged. For initially Ea,o ~ 1/2 + 2exi and Eb,o = 1/2. Most 
of the terms in the sum defining F {x, y) simply average to for both agents, since Alice does not know the 
j/i's and Bob does not know the Xi's. In the first step, however, the expectation that Alice sends to Bob 
reveals xi to him. This causes Eb,i to become l/2 + 2exi + 2ea;iyi, which differs from Ea,o = l/2 + 2ea:i by 
2e. Then in the second step, the expectation that Bob sends to Alice reveals yi to her, thereby "unlocking" 
the terms xiyi and yiX2 in her expectation, and so on. It follows that until all 2n bits xi . . . Xn and yi ■ ■ .yn 
have been exchanged, the agents disagree by 2e with certainty (see Figure |21). 

In switching from F to /, the key observation is that Alice's expectation EA,t (/) of / is a function of 
her expectation 



EA,t = T^ + 2e (xi + xiyi + yix-j H h X(^t-i)/2y{t-i)/2 + y(t-i)/2X(t+i)/2) 



of F. For from Alice's point of view, the later terms X(^t+i)/2y(t+i)/2, y(t+i)/2X(t+z)/2, and so on are steps 
in an unbiased random walk with starting point Ea^i, step size 2e, and "snapping barriers" at and 1. (A 
snapping barrier is neither absorbing nor reflecting: it allows a particle through, but if the particle is found on 
the wrong side of the barrier after the walk ends, then the particle is moved back to the barrier.) Let E\ ^ be 
the ending point of this walk; then Ea.i (/) — EX \^E\ J is a function of EA,t- Likewise, Eba (./) = EX \_Eg j] 
is the expected ending point of an unbiased walk with starting point Es^t = EA,t + 2ea;(f_|_i)/2y(t+i)/2, step 
size 2e, and snapping barriers at and 1. 
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The lower bound for the standard protocol now follows from two claims: first, that EA,t G [1/4, 3/4] and 
EB,t G [1/4, 3/4] for all t € {0, . . . , 2n} with probability at least 6. Second, that whenever EA,t and Eba 
belong to [1/4, 3/4], we have \EA,t (/) — EB,t (/)[ > £■ For the first claim, choose zi, . . . , Z2„ uniformly and 
independently from {0, 1}; then Theorem |21 says that 



Pr [|zi + • • • + Z2n -n\>a (2n)] < 26"*" ". 

2e(2n) ' 



Setting a = „ /„ , , this implies that for any fixed t 



Pr [\EA,t - 1/4| > 1/4] < 2e-i/(64^'") < ^ ^ 



2n 

and similarly for Eg t. The claim now follows from the union bound. For the second claim, a bound similar 
to the above implies that 

Pr [l^;;^., - EA,t\ > 1/4] < 2e-i/(^'^^'") < 



e 

3 

and similarly for E'^ j. This in turn impfies that \EA,t (/) — EA,t\ < e/3 and \EB,t (/) — EB,t\ < e/3, from 
whence it follows that \EA,t (/) — EB,t (/)[ > ^ by the triangle inequality. 

We now give the O (log 1/S) upper bound. It suffices to give a protocol for F, since it is not hard to see 
that switching from F to f can only decrease \EA,t — -E'B,t|- Let k = 81n2/(5. For each i e {1, . . . , k}, Alice 
sends Bob a bit that is uniformly random with probability i/k and Xi otherwise. Likewise, Bob sends Alice 
a bit that is uniformly random with probability i/k and yi otherwise. Then Alice's final expectation is 



while Bob's is 



So 



and hence 



E 



B.2 



1 r. ^ fi 1 

= 2+2^z.^ k y^-^-^ 


+ ^x.m 


i=l ^ 


i 


«) 


2e '" 

Eb,2 - Ea,2 = "t ^ y— 


iXi, 





Pr[|^A,2-^B,2| >e]=Pr[|^i + --- + Zfc~A:/2| > fc/4] 
where Zi = {yi^iXi + 1) /2. Since the zis are uniform, independent samples from {0, 1}, the above proba- 
bility is at most 2e-2(i/4)"fe ^ ^ by TheoremEl ■ 

The main defect of Theorem [7| is that the function / had to be tailored to a particular e. The next 
theorem fixes this defect, although the advantage of the attenuated protocol over the standard one is not 
quite as dramatic as in Theorem [7| For simplicity, in stating the theorem we fix (5 = 1/2. 

Theorem 8 For all 7 G (0, 1), there exist f ,T> such that for all e > l/n^/^^^'''-'; 

(i) Using the standard protocol, Alice and Bob need to exchange fl (l/e^~^) messages before they (e, 1/2)- 
agree. 

(ii) Using the attenuated protocol, they need only exchange 2 messages, both consisting ofO{\/e) bits. 

Sketch. Again we let 2? be uniform over x = xi . . .Xn and y — yi . . .yn in {—1, 1} . We then let 

p/ N 1 , V7 v^ y,-iXi+Xiyi 



2 10 ^ ii/(2-7) 

z— 1 

and 

r F{x,y) iiF{x,y)e[0,l] 

f{x,y)=l iiF{x,y)<0 

[ 1 iiF{x,y)>l 

The rest of the proof is almost identical to that of Theorem [7| so we omit it here. 
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Figure 3: For a sample graph G, spanning tree 7i is shown in soUd hnes, and T2 in dashed hnes. 



3.3 A^ Agents 

We have seen that two Bayesian agents can reach rapid agreement, provided they communicate directly with 
each other. An obvious foUowup question is, what if there are three or more agents, each of which talks 
only to its 'neighbors'? Will the agents still reach agreement, and if so, after how long? 

Formally, let G be a directed graph with vertices 1, . . . , A^, each representing an agent. Suppose messages 
can only be sent from agent i to agent j if {i, j) is an edge in G. We need to assume G is strongly connected, 
since otherwise reaching agreement could be impossible for trivial reasons. In this setting, a standard 
protocol consists of a sequence of edges («i,ji) , . . . , {it,jt) , . . . of G. At the i*'* step, agent it sends its 
current expectation Ei^^t-i of / to agent jt, whereupon jt updates its expectation accordingly. Call the 
protocol fair if every edge occurs infinitely often in the sequence. Parikh and Krasucki 15 proved the 
following important theorem. 

Theorem 9 (^S]) For all f,T>, any fair protocol will cause all the agents' expectations to agree after a 
finite number of messages. 

Indeed, the agents will reach consensus after finitely many messages, meaning it will be common knowl- 
edge among them that Eit = ■ ■ ■ — Ejsi^t. Here, though, we care only about the weaker condition of 
agreement. 

Our goal is to cause every pair of agents to (e, 5)-agree,^ after a number of steps polynomial in A^, 1/5, 
and 1/e. We can achieve this via the following "spanning-tree protocol." Let 7i and T2 be two spanning 
trees of G of minimum diameter, both rooted at agent 1. As illustrated in Figure 13 7i points outward 
from 1 to the other A^ — 1 agents; T2 points inward back to 1. Let Oi be an ordering of the edges of 71, in 
which every edge originating at i is preceded by an edge terminating at i, unless i = 1. Likewise let O2 be 
an ordering of the edges of 72, in which every edge originating at i is preceded by an edge terminating at 
i, unless i is a leaf of 72. Then the protocol is simply for agents to send their current expectations along 
edges of G in the order d, O2, Ci, O2, • ■ •• 

Theorem 10 For all /, T) , the spanning-tree protocol causes every pair of agents to (e, S)-agree after O I -j-^- ) 
messages, where d is the diameter of G. 

Proof. We will track rjt = min^ ||_Ei.f jjj. Observe that, if the i*'* message is from agent i to agent j, then 
the partition {^j,t+i}i^^Q refines both {©i.tj^^gj^ and {^j,t}i^^Q, and therefore 



|£^i,t+i|l2 > max |||£;,,t||2,||£;j- 4112} 



by Proposition 0] Also observe that, in any window of 4A messages, the spanning-tree protocol "sends 
information" from every agent to every other. Together these observations imply that rjt+iM > maxi ||i?i^t||2. 
So as long as there exists an i such that ||-Ei.t||2 3> rjt, the protocol makes significant progress. 

It may happen, though, that ||i5i^t||2 is nearly constant as we range over i. Assume Pr [\Ei^t — Ej^t\ > s] > 
S for two agents i,j. Consider a path from i to j in G, obtained by first following i to 1 in 72 and then 



^If we want every pair of agents to agree within e with global probabihty 1 — <5, then we want every pair to (e, <5/Af^)-agree. 
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following 1 to j in 7i. This path has at most 2d edges. So by the triangle inequality, there exist consecutive 
agents A, B along the path such that 



\Eaa. — EB,t\\2 ^ 



1 

2d 



\E. 



i,t 



Ej,t\\2 > 



2d 



Imagine that the < message is from A to B. Then since {i^B^t+i}^ ^ 
{nB,t}^^n' Proposition m yields 

\\EB,t+l ~ -£-^,4112 ^ ll^-B,t+l|l2 ^ l|£'A,t|l2 J 
\\EB,t + l — E 

Also, by the triangle inequality either 

\\EB.t + l 



refines both {0A,t}(^gQ and 



B,t\ 



\E 



S,t+l|l2 



\E 



B,t\ 



EAA\l>\\\EA^t 



Eb,. 



or 



Therefore 



\\Eb,, 



EB.t\\l>\\\EA.t 



EB.t] 



\Eb,+i\\1 > min{|l£;^,|l^ , WEbAI} + \ (^) 



> 



m 



16d2- 



It remains only to show why the above result is not spoiled if A or _B receive other messages before A 
sends its message to B. Let u be the first time step after t in which A sends a message to B, and suppose 
the steps between t and u somehow reduce the distance between Ea and Eb'- 



\Ea,u — Eb, u\\2 ^ 



16d2' 



Then by the triangle inequality (again!): 



\Ea,u — EA,t\\2 + \\Eb,u — EB,t\\2 — \\EB,t — EA,t\\2 ^ \\Eb,u — Ea,u\\2 > \ ~rJ2 




fe2 

16^2 



so either 



or 



\Ea. 



fc2 
64^2 

fc2 



\\Eb,u-EbA2> Q^^2- 

Suppose the former without loss of generality. Then since {51^. „}^^ refines {D,A,t}^j^Q-, 

fe2 



\ea,u\\1 = \\eaaI + 11^^." - EaAI >^t + 



64d2- 



We have shown that max^ ||i5i^t4.2Ar||2 = % + ^ {Se^ /d^^^ from which it follows that r^t+QN = Vt + 
fl {Se^ /d^y Hence the constraint 774 < 1 yields an upper bound of O [Nd"^/ {Se'^)) on the number of 
messages. ■ 

Let us make three remarks about Theorem 1101 First, naturally one can combine Theorems 1101 and IHl 
to obtain an A^-agent protocol in which the messages are discrete. We omit the details here. Second, all 
we really need about the order of messages is that information gets propagated from any agent in G to any 
other in a reasonable number of steps. Our spanning-tree construction was designed to guarantee this, but 
sending messages in a random order (for example) would also work. Third, it seems fair to assume that 
many agents send messages in parallel; if so, our complexity bound can almost certainly be improved. 
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4 Computational Complexity 

The previous sections have weakened the idea that communication cost is a fundamental barrier to agreement. 
However, we have glossed over the issue of computational cost entirely. A protocol that requires only 
O (1/ ((Je^)) messages has little real- world relevance if it would take Alice and Bob billions of years to 
calculate the messages! Moreover, all protocols discussed above seem to have that problem, since the 
number of possible states \Q\ could be exponential in the length n of the agents' inputs. 

Recognizing this issue, Hanson ^. introduced the notion of a "Bayesian wannabe" : a computationally- 
bounded agent that can still make sense of what its expectations would be if it had enough computational 
power to be a Bayesian. He then showed that under certain assumptions, if two Bayesian wannabes agree 
to disagree about the expectation of a function / {lo), then they must also disagree about some variable that 
is independent of the state of the world w e 17. However, Hanson's result does not suggest a protocol by 
which two Bayesian wannabes who agree about all state-independent variables could come to agree about / 
as well. 

Admittedly, if the two wannabes have very limited abilities, it might be trivial to get them to agree. For 
example, if Alice and Bob both ignore all their evidence and estimate / (uj) = 1/3, then they agree before 
exchanging even a single message. But this example seems contrived: after all, if one the agents (with equal 
justification) estimated / (uj) = 2/3, then no sequence of messages would ever cause them to agree within 
£ < 1/3. So informally, what we really want to know is whether two wannabes will always agree, having 
put in a "good-faith effort" to emulate Bayesian rationality. 

We are thus led to the following question. Is there an agreement protocol that 

(i) would cause two computationally- unbounded Bayesians to (e, 5)-agree after a small number of messages, 
and 

(ii) can be simulated using a small amount of computation? 

We will say shortly what we mean by a "small amount of computation." By "simulate," we mean that a 
computationally-unbounded referee, given the state uj € fl together with a transcript M = {mi, . . . ,mii) of 
all messages exchanged during the protocol, should be unable to decide (with non-negligible bias) whether 
Alice and Bob were Bayesians following the protocol exactly, or Bayesian wannabes merely simulating it. 
More formally, let B (lo) be the probability distribution over message transcripts, assuming Alice and Bob 
are Bayesians and the state of the world is lu. Likewise, let W (ui) be the distribution assuming Alice and 
Bob are wannabes. Then we require that for all Boolean functions $ {oj, M), 



Pr [$ (uj, M) = 1] - Pr [$ (uj, M) = 1] 



<c 



(*) 



where ^ is a parameter that can be made as small as we like (say 0.00001). 

A consequence of the requirement ((j) is that even if Alice is computationally unbounded, she cannot 
decide with bias greater than C, whether Bob is also unbounded, judging only from the messages he sends to 
her. For if Alice could decide, then so could our hypothetical referee, who learns at least as much about Bob 
as Alice does. Though a little harder to see, another consequence is that if Alice is unbounded, but knows 
Bob to be bounded and takes his algorithm into account when computing her expectations, her messages 
will still be statistically indistinguishable from what they would have been had she believed that Bob was 
unbounded. Indeed, no beliefs, beliefs about beliefs, etc., about whether either agent is bounded or not can 
significantly affect the sequence of messages, since the truth or falsehood of those beliefs is almost irrelevant 
to predicting the agents' future messages. Also, if Alice is unbounded for some steps of the protocol but 
bounded for others, then Bob will never notice these changes, and would hardly behave any differently were 
he told of them. 

Because of these considerations, we claim that, while simulating a Bayesian agreement protocol might 
not be the only way for two Bayesian wannabes to reach an "honest" agreement, it is certainly a sufficient 
way. Therefore, if we can show how to meet even the stringent requirement ((j), this will provide strong 
evidence that computation time is not a fundamental barrier to agreement. 

But what do we mean by computation time? We assume the state space 17 is a subset of {0, 1}" x {0, 1}", 
so that Alice's initial knowledge is an n-bit string x, and Bob's is an n-bit string y. Given the prior 
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distribution V over (x, y) pairs, let Va^x be Alice's posterior distribution over y conditioned on x, and let 
T^B,y be Bob's posterior distribution over x conditioned on y. The following two computational assumptions 
are the only ones that we make: 

(1) Alice and Bob can both evaluate / {oj) for any uj E fl. 

(2) Alice and Bob can both sample from T>a,x foi' any x € {0, 1}", and from 'DB,y for any y E {0, 1}". 

Our simulation procedure will not have access to descriptions of / or V; it can learn about them only 
by calling subroutines for (1) and (2) respectively. The complexity of the procedure will then be expressed 
in terms of the number of subroutine calls, other computations adding a negligible amount of time. Thus, 
we might stipulate that both subroutines should run in time polynomial in n. On the other hand, n could 
be extremely large — otherwise the agents would simply exchange their entire inputs and be done! So we 
probably want to be even stricter, and stipulate that the subroutines should use time (say) logarithmic in n, 
albeit with many parallel processors. The latter seems like a better model for the human brain; after all, to 
reach an opinion based on our current knowledge, we do not contemplate every fact we know in sequential 
order, but instead zero in quickly on the relevant facts. In any case, the simulation procedure will treat the 
subroutines purely as "black boxes," so decisions about their implementation will not affect our results. 

The justification for assumptions (1) and (2) is that without them, it is hard to see how the agents could 
estimate their expectations even before they started talking to each other. In other words, we have to 
assume the agents enter the conversation with minimal tools for reasoning about their universe of discourse. 
We do not assume that those tools extend to reasoning about each other's expectations, expectations of 
expectations, etc., conditioned on a sequence of messages exchanged. That the tools do extend in this way 
is what we intend to prove. 

The one assumption that seems debatable to us is that Alice can sample from Bob's distribution T^B^y, 
and Bob can sample from Alice's distribution 'DA,y How can an agent possibly be expected to possess 
"someone else's" sampling subroutine? On further reflection, though, this question is simply a variant of 
an earlier question: why can we assume that Alice knows Bob's set of possible states Vis (w), and that Bob 
knows VIa (w)? For if Alice knows Q,b (oj) as well as Bob does, then there is no particular reason why she 
should not be able to sample from it as well as he can. Again, the reason the agents know each other's 
partitions is that the state of the world w € fi includes both agents' mental states as part of it. None of this 
seems too out of line with everyday experience — for whenever we use what we know to try and figure out 
what someone else might be thinking, a Bayesian would say we are sampling an oj from our set of possible 
states, then sampling from what the other person's set of possible states would be if the state of the world 
were u. 

Finally, let us note that assumptions (1) and (2) can both be relaxed. In particular, it is enough to 
approximate / {uj) to within an additive factor rj with probability at least 1 — 77, in time that increases 
polynomially in l/rj. It is also enough to sample from a distribution whose variation distance from 'Da,x 
or T^B^y is at most rj, in time polynomial in I/77. Indeed, since the probabilities and /-values are real 
numbers, we will generally need to approximate in order to represent them with finite precision. For ease 
of presentation, though, we assume exact algorithms in what follows. 

4.1 Smoothed Standard Protocol 

Naively, requirement (jj) seems impossible to satisfy. All of the agreement protocols discussed earlier in 
this paper — for example, that of Theorem — are easy to distinguish from any efficient simulation of them. 
For consider Alice's first message to Bob. If Alice's expectation Eaa is below some threshold c, she sends 
one message, whereas if Ea^ > c, she sends a different message. Even if we fix /, and limit probabilities 
and /-values to (say) n bits of precision, we can arrange things so that Ea.o (i^) is exponentially close to c, 
sometimes greater and sometimes less, with high probability over oj. Then to decide which message to send, 
Alice needs to evaluate / exponentially many times. 

We resolve this issue by having the agents add random noise to their messages ( "smoothing" them) , even 
if they are unbounded Bayesians. This noise does not prevent the agents from reaching (e, (5)-agreement. 
On the other hand, it makes their messages easier to simulate. For unlike real numbers a ^ h, which are 
perfectly distinguishable no matter how close they are, two probability distributions with close means may 
be hard to distinguish, like wavepackets in quantum mechanics. 
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Figure 4: Agent i "smoothes" its expectation Ei,t with triangular noise before sending it. 



In the smoothed standard protocol, Alice generates her messages to Bob as follows. Let b > log2 (200/e) 
be a positive integer to be specified later. Then let e be an integer multiple of 2~^ between e/50 and 
e/40, and let L = 2^e. First Alice rounds her current expectation Ea.i of / to the nearest multiple of 2~^. 
Denote the result by round (EA^t)- She then draws an integer r G {—L, . . . , L}, according to a triangular 
distribution in which r — j with probability (L — \j\) / L"^ (see Figure 0)). The message she sends Bob is 
TOt+i = round {EA,t) + 2~''r. Observe that since mt+i G [— e, 1 + e], there are at most 2^ (1 + 2e) + 1 possible 
values of mt+i — meaning Alice's message takes only 6+1 bits to specify. After receiving the message, Bob 
updates his expectation of / using Bayes' rule, then draws an integer r G {— i, . . . ,L\ according to the same 
triangular distribution and sends Alice mt+2 ~ T^ovind [E b ,t+i) + 2 r. The two agents continue to send 
messages in this way. 

The reader might be wondering why we chose triangular noise, and whether other types of noise would 
work equally well. The answer is that we want the message distribution to have three basic properties. 
First, it should be concentrated about a mean of Ei^t with variance at most ~ t^ . Second, shifting the mean 
by ?7 < e should shift the distribution by at most ~ry/e in variation distance. And third, the derivative of 
the probably density function should never exceed ~rj/e^ in absolute value. Thus, Gaussian noise would also 
work, though it is somewhat harder to analyze than triangular noise. However, noise that is uniform over 
[— e, e] would not work (so far as we could tell), since it violates the third property. 

Before we analyze the protocol, we need to develop some notation. Let Mt — {mi, . . . ,mt) consist 
of the first t messages that Alice and Bob exchange. Since messages are now probabilistic, the agents' 
expectations of / at step t depend not only on the initial state of the world cu, but also on Mt. When we 
want to emphasize this, we denote the agents' expectations by EA,t {^^Mt) and Eb^i (w, M*) respectively. 
Another important consequence of messages being probabilistic is that after an agent has received a message, 
its posterior distribution over uj is no longer obtainable by restricting the prior distribution I? to a subset of 
possible states. Thus, we let fi^ (w) = Vli^ (w), since we will never refer to Vti^t {'-^) for t > 0. 

Say the agents (£,5)-agree after the i*'* message if 



Pr 



^ [\EA,t ito,Mt) - EB,t iL0,Mt)\ >e]<S. 
t 



Also, let 



\E,,_ 



= EX 



E,^tiLO,Mt) 



Theorem 11 For all /, P, the smoothed standard protocol causes Alice and Bob to {e, 5)-agree after at most 
2/ ((5e^) messages. 

Proof. Similarly to Theorem |H| we let E^t be the expectation of a third party Charlie who sees all 
messages between Alice and Bob, but who knows neither their inputs nor the random bits that they use to 
produce their messages. We then track ||£^c_f||2- 

Assume that Pr[|£'^^t — EB,t\ > s] > 3 and that Alice sends the i*'' message nit. Notice that mt 



cannot deviate from Alice's expectation EA,t — EA,t-i by more than 2e, since |round (i^A.t) 
\m,t — round {EA,t)\ < £• So keeping Mt fixed, 

\EA,t{^.Mt)-EA^t{i.o',Mt)\ <4e 



Ea 1 1 < £ and 
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for all CO, Lo'. Now Charlie's expectation Ec.t {^, Mt) is just an average of Ea^i (^', ^^t)'s, so it follows that 

\Ec,t{uJ,Mt)-EA,t{^.Mt)\ <4e 
as well. Similarly, after Bob sends the (t + 1)^ message, 

\Ec,t+i {uj,Mt+i)~EB,t+i {uJ,Mt+i)\ < 4e. 

Therefore 

Pr [\Ec^t+i - Ec^t I > e - 8e] > Pr [\EA^t - £^5,* | > e] > <5, 

using the triangle inequality and the fact that Es^t+i — Eb.i- The final observation is that Charlie's 
partition of fJ x Mt+i at step t + 1 refines his partition at step t, so by Proposition^] 

\\Ec^t+i\\l - \\EcA\l = W^c.t+i - Ec.tWl >H£- 8e)' . 

Since ||i?c,t|l2 — 1' *^^^ yields an upper bound of 1/ ( (5 (e — 8e) j < 2/ {Se^) on the number of messages. ■ 

4.2 Simulating the Smoothed Protocol 

Having proved that the smoothed standard protocol works, in this section we explain how Alice and Bob 
can simulate the protocol. In the ideal case — where the agents have unlimited computational power — they 
use the following recursive formulas. Let 

, , / 1 - |mt - round (£'i,t_i)| /e if \mt - round {Ei,t^i)\ < e 

A{muE,,t-i) = ^ Q otherwise 

be proportional to the probability that agent i sends message mt, given that its expectation is Ei^t-i- Also, 
let qt (oj, Mt) be proportional to the joint probability of messages rrii, . . . ,mt assuming the true state of the 
world is to. Then assuming t is even and suppressing dependencies on Mt, for all X,Y we have 

qt{Y) = qt-2iY)A{mt,EB,t^i{Y)), 
qt^i (X) = qt-3 (X) A {mt-i,EA,t^2 (X)) , 
EXYenAX)kt{Y)f{Y)] 



EA^t (X) = 
EB.t-i (Y) = 



EXxensiY)[qt-iiX)f{X)] 
^^xeiiBiY) [qt-i (X)] 



with the base cases go (Y) = q-i (X) = 1 for all X, Y. The correctness of these formulas follows from simple 
Bayesian manipulations. Having computed Ei^t i^) by the formulas above (note that this does not require 
knowledge of w), all agent i needs to do is draw r S {—L, . . . , L} from the triangular distribution, then send 
the message 

mt+i = round {Ei^t (c^)) + 2"^. 

In the real case, the agents are computationally bounded, and can no longer afford the luxury of taking 
expectations over the exponentially large sets fli. A natural idea is to compensate by somehow sampling 
those sets. But since we never assumed the ability to sample fli conditioned on messages mi, . . . ,mt, it 
is not obvious how that make that idea work. Our solution will consist of two phases: the construction 
of "sampling-trees," which involves no communication, followed by a message-by-message simulation of the 
ideal protocol. Let us describe these phases in turn. 

(I) Sampling- Tree Construction. Alice creates a tree Ta with height R and branching factor K. 
Here R < 2/ (fc^) is the number of messages, and X is a parameter to be specified later. Let root^ 
be the root node of Ta, and let S (v) be the set of children of node v. Then Alice labels each of the K 
nodes w € S'(rootA) by a sample Y^ g Qa (w), drawn independently from her posterior distribution Va.x- 
Next, for each w ^ S (root^), she labels each of the K nodes v G S (w) by a sample X^ e Hb (Yw), drawn 
independently from Bob's distribution I^B.y where Y^ = {x,y). She continues recursively in this manner, 
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K + K"^ + --- + K" = '—- — -^ - 1. 



labeling each v an even distance from the root with a sample X„ G 51^ ^w) where w is the parent of w, and 
each w an odd distance from the root with a sample Y^ e ^a i^^v) where v is the parent of w. Thus her 
total number of samples is 

X^+i - 1 
K- 1 

Similarly, Bob creates a tree 7b with height R and branching factor K. Let roots be the root of Tg; then 
Bob labels each w G 5 (roots) by a sample X^ G ^b (i^), each child we S (y) of each v e S (roots) by a 
sample Yu, G ^a (Xy), and so on, alternating between fls and fi^i at successive levels. As a side remark, 
if the agents share a random string, then there is no reason for them not to use the same set of samples. 
However, we cannot assume that such a string is available. 

(II) Simulation. We now explain how the agents can use the samples from (I) to simulate the smoothed 
standard protocol. First Alice estimates her expectation Ea.o by the quantity 

{EA,o{rootA))A= EX [/(y^)] = l J2 /(^-)- 

She then chooses a random r E {—L, . . . ,L} and sends Bob 

mi = roimd ((-E^,o (root^))^) + 2^''r. 
On receiving the message, for each v E S (roots) Bob computes 

his estimate of Eaa i^v) assuming lo = Xy. He then defines 

{la («))i3 = ^ {mi, {Ea,o (w))i3) 
and estimates his own expectation £^5,1 (lo) by 



(Si34 (roots)). 



Ei,eS(rootB)(9o(«))B,/(^«) 



J2yeSirootB) (90 (w))s 

Finally, he chooses a random r G {— i, . . . , L} and sends Alice 

1712 = round {(Eba (roots))^) + 2"V. 
In general, if t is even then the recursive formulas for agent i are 

{qt H), - {qt-2 (w)), A {mt, {EB,t-i (w')),) , 
{qt-i {v)), = {qt-3 {v)}, A (mt-1, {EA,t-2 {v));) , 
J2wesiv)iqtiw))JiY^) 






Et,gs(t»)(gt-i(^))i/(-^f) 



with the base cases (go (w))i = (Q-i (^))i — 1 for ^U w,f. Agent i computes a message rrit in the obvious 
way, from its expectation at the root of 71: 

mt = round ((Si,t-i (root^)) .) + 2"V. 

That completes the description of the simulation procedure. Its complexity is easily determined: let Ti 
be the number of computational steps needed to sample from Va^x or T^B,y, and let T2 be number of steps 
needed to evaluate /. Then both agents use O [K^ (Ti + T2)) steps, where we have summed over all R 
communication rounds. Thus, the complexity is exponential in i? « 2/ (fc^j; on the other hand, it has no 
dependence on n. 
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4.3 Analysis 

Our goal is to show that the message sequence in the simulated protocol is statistically indistinguishable from 
the sequence in the ideal protocol, for some reasonable sample size K. Here 'reasonable', unfortunately, is 
still quite huge: of order (11/e) /( , where C is the maximum bias with which a referee can distinguish the 
conversations. So assuming e > e/50 and R < 2/ {Se"^) , the total number of computational steps is of order 



i?2 



(Ti+T2)-exp 



81n(550/£) 41n(l/C) 



(Ti+Ta). 



The reader might complain that this bound is not at all reasonable: for example, ii s = S = 1/2, then it 
translates into more than 2^^®^^ subroutine calls! Let us make two points in response. First, we do show 
that the number of subroutine calls needed is independent of n, and that it grows "only" exponentially in a 
polynomial in 1/S and 1/e. Theoretical computer scientists often see cases in which the first polynomial-time 
algorithm for a problem has a completely impractical complexity, say n*'^. However, once the problem is 
known to be in polynomial time, it is usually possible to reduce the exponent to obtain a truly practical 
algorithm. In our case, wc conjecture that the factor of 1/ (i5'^e^) in the exponent could be reduced to 
1/ ((5^£'*) or even 1/ (fc^) ; certainly the constants in the exponent can be reduced. The second point is that 
the complexity is so large only because we never assumed the agents can sample from their sets of possible 
states conditioned on messages exchanged. So the best they can do is to sample a huge number of states 
from their original sets Qa and fls, then retain the few that are compatible with the messages. However, it 
seems likely that agents would have at least some ability to sample conditioned on messages. After all, we 
assumed that they enter the conversation with the ability to sample, and presumably they have had other 
conversations in the past! In practice, then, the complexity will probably be better than the worst-case 
estimate above. 

How do we prove the simulation theorem? In one sense, the proof is 'merely' an exercise in error analysis 
and large deviation bounds. However, the details are extremely subtle and difficult to get right. The 
problem is that if a message has probability q from its recipient's point of view, then order 1/q samples are 
needed to find even a single input that could have caused the sender to produce that message. Fortunately, 
low-probability messages are unlikely to be sent, for almost tautological reasons that we spell out in Lemma 
1141 However, because the sample trees % are so large, with overwhelming probability they contain some 
nodes v with miniscule values of {qt (w))^. We need to argue that the errors introduced by these "bad nodes" 
are washed out by the good nodes before they can propagate to the root. 

The proof will repeatedly use the Chernoff-Hoeffding bound (Theorem EJ. As shown by the following 
corollary. Theorem |21 sometimes lets us estimate the mean of a random variable, even if we cannot sample 
that variable directly. 

Corollary 12 Let pi, . . . ,pn and xi, . . . , a:„ belong to [0, 1], and let P = pi + ■ ■ ■ + pn and x — piXi + • • • + 
PnXn- If we choose K indices i (1) , . . . , i {K) uniformly at random from {1, . . . , n}, then 



Pr 



Pi{i)Xi(i)^ 'rPi{K)Xi{K) X 



P^(l) 



- Pi(K) 



> a 



< 4e-"'(^/")'-f^/2. 



Proof. Let 



P^j^ (p^m 



X 



K 



■■+Pi{K)) , 
{Piil)Xi(i) H VPi{K)Xt{K)) 



Then since X <P, 



X 
P 



X 
P 



X[P -P] - P\X-X] P-P 



X -X 



PP 



P 



P 



19 



So 



By Theorem El 



and similarly for 



Pr 



r X 


X 








aP] 






aP^ 






> a 


< Pr 


H- 


-P 


> — 


+ Pr 


X - 


-X 


> ^ 


p 


p 








^ J 






^ J 



Pr 



K 

n 



P -P 



2n 



< 2e-"'(^/")'-^/2 



X- X 



We will also need a bound for a sum of exponentially distributed variables, which can be found in |^ for 
example. 

Theorem 13 Let xi,...,xk G [0,oo) he independent and exponentially distributed with mean 1 (that is, 
Pr [x, > x] = e-'^). Then 



Vv[xi + ■ ■ ■ + XK > [l + a) K] < 



1 + a 



-K 



For convenience, we will state our results in terms of Alice's tree Ta, with the understanding that they 
apply equally well to Tb- Throughout, we assume that t is even and that the t*'' message mt is sent from 
Bob to Alice. Let Qt = Sygo lui) tt 0^) measure the "likelihood" of Alice's situation at step t. Then 
Qt/Qt-2 measures the likelihood of the i*'' message, conditioned on Alice's situation just before she receives 
it. The following lemma says essentially that "unlikely messages are unlikely." 

Lemma 14 For all inputs x of Alice, message sequences Mt-i, and constants 7 > 0, 

7e" 



Pr 

nit 



]t-2 - 2 



<7- 



Proof. For all m £ [— e, 1 + e]. 



Pr [m.t = m] = ^ Pr [round (£'5,4-1 (Y)) = m + 2 ''j] 

je{-L,...,L} \^ 

_ lEYenA^)<lt-2iY)Aim,EB,t-iiY)) _ l Q, 

L 



A (m, m + 2-''j 



from Alice's point of view. So it suffices to observe that 

76 



LQt 



Pr 



Pr [mt — m] < 

mt 2L 



< ^ < 7. 

- 2L e ' 



Here the first inequality follows from elementary probability theory, together with the fact that there are at 

most (1 + 2e) /2^'' + 1 possible messages m, and hence the mean of Prm^ [mt = m] over m chosen uniformly 

at random is at least 

1 e 



(1 + 2e) /2-'' + 1 L(l + 2e) + e' 

The second inequality follows since e < 1/4. ■ 

A consequence of Lemma 1141 is that unlikely sequences of messages are unlikely. For the remainder of 
this section, let g = —InK. 



Lemma 15 For all j > and all x, 



Pr [Qt<7]<ff*/'max<{7,^ 
y,Mt I K 
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Proof. For all u G {2, 4, . . . , t}, let Xu = In {eQu-2/2Qu)- Then 



2Qt 2Qt-2 2Q2 /e\*/2 



eQt^2 eQt^4 eQo 



(I) 



(I) 



t/2 



since Qq — 1. Furthermore, Lemma 1 1 41 implies that for each u, 

' Qu 



Pr [xu > a;] = Pr 



^u~ 



< 



< e 



even conditioned on X2, ■ ■ ■ ,Xu-2- Therefore X2 + ■ ■ ■ + Xt is stochastically dominated by a sum of t/2 
independent exponential variables each with mean 1. So by Theorem 1131 



X2 + --- + Xt>{l + a)- 



< 



1 + a 



-t/2 



Setting 7 = e-(i+")*/2 (e/2)*/^ and solving to obtain a = (2/t) In ((e/2)*/^ 77') - 1, it follows that 



,(2/t)l„((e/2)*/V7)-l 

Pr [Qt < 7] < I ^^ 7 

^'■^' \ (2/t)ln((e/2)*/V7) 



-t/2 



< ( -^ In - j 7 < 5'^^ max | 7, -^ [> ■ 



In the next four results, we fix a particular node d e 7^, then study how the error at v depends on the 
errors at its children w G S{v). For simplicity, we assume v is an even distance from the root, but our 
results will apply equally to nodes an odd distance from the root. We need to upper-bound the expected 
difference between Alice's actual expectation {EA,t (w))^, and her ideal expectation EA.t i^v)- To this end, 
it will be helpful to define the following "hybrid" between (EA,t {v))a ^-^id EA,t i^v)- 



EXt {v) 






To compute E'^ j, we use the ideal weights qt (1^), but we average over Ahce's K samples {yw}j^^s(v) o^IY' 
not over all of Qa {Xy). By the triangle inequality, to upper-bound \{EA,t {v))a ^ EA,t {Xy)\ it suffices to 
upper-bound | {EA,t i^)) a " ^a t (^) I ^^"^ I^a t (") ~ EA,t {Xv)\ ■ We start with the latter. 



Lemma 16 



EX [\EX,{v) - EaAX.)\] < ^^. 

V,Mt,S{v) ^K 



Proof. Assuming Qt = Q, 

Pr [\EXt (v) - EA,t (X„)| > L^] < 4e-"'Q'^/2 
by CoroUary^l Furthermore, since E\ ^ (v) and EA,t i^v) are in [0, 1], we have the trivial but important 
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bound \E*^t (v) - EA,t {Xy)\ < 1. Therefore 

EX[\E%tiv)-EA,tiX,)\]= f PT[\E%t{v)-EA.tiX,)\>x]dx 

Jo 



<4 / EX 

Jo Q* 

= 4 / / Pr 

Jo Jo 

= A j / Pr 

Jo Jo 



'X^qIk/2 



dx 



,-x'QlK/2 > 



1/2,1 



dxdx 
dxdx 



<4 



< 



/ / min < 1, max < —\-^\ri—, — > o*' ^ > dxdx 
Jo Jo \ \x\ K x'/fj^ J 



^9 



Here the fifth Une uses Lemma [T51 and 



t/2 



K 
K 



Jo 

1 

x=0 



In - a;,nin (x) + 



K X 



1 



-dx dx 



»W 



x^n,{x)^g''''sj-\n-. 

By straightforward integral approximations, the last expression is at most Ig^l"^^^ jyfK for sufficiently large 
K. ■ 

For each child w e S(v), let 

??« {w)= ^ I (^B,« (w))^ - Eb,u ^w)\ 

ue{l,3,. ..,*-!} 

measure the total error in Alice's estimates of Eb.u (^ui), summed over all time steps u < t. The following 
proposition shows that to upper-bound the error in {qt {w))^, it suffices to upper-bound rjt (w). For this 
proposition to hold, we need the function A to have bounded derivative. That is why we chose triangular 
instead of uniform noise when defining the protocol. 



Proposition 17 



Proof. From the definition of A, 



\{qt{w))^~qt{Y^)\< 



Vt{w) 



I A (m„+i, (Eb,u {w))a) - A {mu+i,EB^u {Y^))\ < - \ {Eb.u H)a ~ ^b,u {y^)\ ■ 
Furthermore, A (77i„+i, {Eb.u (w)) 4) ^-nd A (77i„^i, Eb.u {Yw)) a-^c both bounded in [0, 1]. It follows that 



\{qt{w))A-qtiY^)\ 



Y[A (to„+i, {Eb.u {w))a) - n^ {mu+i,EB.u {Yy,)) 



< ^ - I {Eb.u H)a - Eb.u (Yu, 



Vt (w) 



where u ranges over {1, 3, . . . , i — 1}. 
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Now let 

w£S(v) 

weS(v) 

w£S(v) 

SO that E\f{v) — F/H and {EA^tiv))^ — i^) a / i^) A- Using Lemma [T31 we can upper-bomid the 
probabihty that H is too much smaller than its mean value. 



Corollary 18 For all 7 > 0, 



Pr \H < -/K] < 3g*/^ max i 7, ^ 



Proof. By the principle of deferred decisions, we can think of each qt {Y^^,) as an independent sam- 
ple of a [0,1] random variable with mean Qt- Then H is a sum of K such samples. Setting F = 
max {27, 8 (In iiT) /K}, by Lemma [131 we have 

Pr [Q,<r]<25*/^max(7,^ 



Furthermore, assuming Qt > F, Theorem |5| yields 



Pr \H < -/K] < exp 

S(v) 



— frf)^-^'"'^- 



The corollary now follows by the union bound. ■ 

The last piece of the puzzle is to upper-bound the difference between (-Ea,* ('^))a ^-nd E^^ (v), using 
techniques similar to those of Lemma ITKl Let 77 — EX^g5(„) [rjt (w)] and fj = EXy_Mt,TA M- 

Lemma 19 Assuming rj > 1/K for all y, Mt, Ta, 

EX [\{EA,t {v))a - EXt {v)\]< 18<?*/2+i5^. 
y,Mt.jA 



Proof. Using the fact that (F)^ < {H) 



A' 



\{EAAv))A^EXt{v)\^ 



(F). 



(H). 



F 
H 



< 



\{H) 



HI 



\{F), 



F\ 



H 



H 



by the same trick as in Corollary^] Furthermore, it follows from PropositionEltogether with the triangle 
inequality that I (if )^ — i:r I < r^i^/e and |(F)^ — F| <r]K/e. So we can upper-bound |(i!^A_t (z;))^ — E'^ ^ (w)| 
by 2riK/ (eH), as well as (of course) by 1. Fix r/; then 



EX 

H 



min < 1 



2riKr 
liTl 



= / Pr 

H 



2r]K 



eH 



> X 



dx 
2r] AInK 



< / min < 1 , 3 max , , 
/o I [ ex K 



*/2 dx 



< 3g*/^ 



K 



, — )■ dx 



— 6g 



K 



3g*/2 



3.g*/2 ' ex 

' 277 
— dx 
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where the second line uses Corollary 1 181 and Xmin = {drj/e) g*"/"^ . This in turn is at most 

Assuming rj > 1/K always, the expectation of the above quantity over rj is at most ISg^^^^^fj. ■ 

We are finally ready to put everything together, and show that the referee can distinguish the real and 
ideal conversations with bias at most C- 

Theorem 20 By setting b — [logg R/ (Ce)l + 2 and K = O ( (H/e) /C^ ) , it is possible to achieve 

Pr [<^{uj,Mr)^\\~ Pr [<^{uj,Mr) = 1] <C 

Lue'D,Mew{uj) Luev,MeB{uj) 

for all Boolean functions $. 

Proof. Combining Lemmas 1161 and [T^ 

EX [\{EA^t {v))a - EA,t iX.)\] < <?*/'+' (^ + 18^ 



K 



Let Cj be the set of nodes at the j level of Alice's tree Ta- Then if j is even, let 



A,- = EX 

■' vec, 



J2 EX [\{EaAv))a'EaAXv)\] 

, ^^ , y,Mt-JA 
te{i,j+2,...,_R} 



Aj-|-i — EX 



E 



te{j +1,3 +?,,... M.~i} 



y,Mt,TA '-' ' ^ "A 'U 



By linearity of expectation. 



''<h')^"'"{7W^ ''''»)■ 



Solving this recurrence relation, we find that at the root node, 

Ao<(9i?+18)«5'''/2+K 



K 



and similarly for the root of Bob's tree Tg. So in particular, EX^M^.Ti [dt] < Aq + 2 ''+^ for all i, t, where 

dt = jround [{E^^t (rooti)) .) - round {E^^t {^))\ ■ 

Now observe that, if we let Wt+i be the distribution over message mt+i in the wannabe case, and let Bt+i 
be the distribution in the unbounded Bayesian case, then 



\Wt+^-Bt+i\\, = - J2 



1 ^'^ 2(£-r + l) ^ dt/2-' ^ dt_ 

r— 1 



where || ||j^ denotes variation distance. So the referee can distinguish the whole conversations with bias at 
most 



- EX [do 
e 



dr. 



-i] <-(Ao + 2-^+1) i? 



since variation distance satisfies the triangle inequality. Therefore, we can achieve the goal of simulation by 
taking Ao < C,e/R - 2-^+'^ < Ce/2R, or equivalently 
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5 Discussion 

"We publish this observation with some diffidence, since once one has the appropriate framework, 
it is mathematically trivial. Intuitively, though, it is not quite obvious..." — Aumann [2], on 
his original agreement result 

This paper has studied agreement protocols from the quantitative perspective of theoretical computer 
science. If nothing else, we hope to have shown that adopting that perspective leads to rich mathematical 
questions. Here are a few of the more interesting open problems raised by our results: 

• How tight is our O (1/ (fe^)) upper bound? Can we improve Theorem|^to show that the discretized 
standard protocol uses only O (l/e'^) messages, independently of 61 More importantly, is there a 
scenario where Alice and Bob must exchange fi (1/e) or Vl (l/e^) bits to (e, l/2)-agree, regardless of 
what protocol they use? Recall that the best lower bound we currently know is ri(logl/e), from 
Proposition 131 

• Can Alice and Bob (e, (5)-agree after a small number of steps, even if the "true" distribution over w 
differs from their shared prior distribution 2?? Or is there a scenario where regardless of what protocol 
they use, there exists a state w for which they must exchange fi (n) bits to agree within e on w? (It 
is easy to construct a scenario where the discretized standard protocol needs fi (n) bits for some w.) 

• Can the simulation procedure of Section 14.21 be made practical? That is, can we reduce the number 

of subroutine calls to (say) c^'^ ^ ' , oi even to a polynomial in \/5 and 1/e? Alternatively, can we 
prove a lower bound showing that such reductions are impossible? 

• Can we obtain a better simulation procedure if T) is represented in a compact form, for example a 
graphical model? 

Stepping back, have the results of this paper taught us anything about the origins of disagreement? As 
mentioned in Section ^ it is easy to list plausible reasons why people might disagree, Aumann's theorem 
notwithstanding: indifference to truth, misconstrual, vagueness, dishonesty, self-deceit, mistrust, stupidity, 
systematic cognitive biases, no priors, different priors, different indexicality assumptions, diagonalization (as 
discussed in Section |2Jl, communication cost, and computation cost, among others. But which of these 
reasons, if any, are fundamental? In other words, were we forced to identify a single point at which the 
assumptions of Aumann's theorem diverge from reality, what would it be? 

Before we undertook the research described in this paper, we would have said either that 

(1) imposing reasonable communication and computation bounds is likely to change everything, or 

(2) at least one party to any persistent disagreement must be dishonest, irrational, or indifferent to truth. ^ 

Today, however, we would make an argument less technical than (1) and less misanthropic than (2): that 
even in idealized models, we should not treat agents as initially-identical Bayesian "containers" that later 
get filled with different experiences. In particular, the Common Prior Assumption (CPA) is fundamentally 
misguided. 

Presumably no one would claim that the CPA is empirically true for human beings. It seems obvious 
that, when five-year-olds go to Sunday school, they are not updating a shared prior over possible religions 
conditioned on what their teacher tells them. Rather, their priors are being "initialized" to some extent. 
Furthermore, the existence of a common prior would be astonishing from the perspectives of physics, evo- 
lutionary biology, and neuroscience, since nothing in those fields predicts or requires one. However, as 
Aumann |3] rightly emphasizes, the question is not whether the CPA is "true" but whether it is a useful 
idealization. What we suggest is that, when trying to understand the origins of disagreement, the CPA is 
not a useful idealization. There are two main reasons for this. 

First, the CPA presents difficulties with transtemporal identity. Are you really the "same" person as 
you were when you were two months old? If not, then why must your posterior be obtained by updating 

^Here "indifference to truth" means choosing opinions according to their novelty, social acceptability, value in attracting 
sexual partners, etc. rather than evidence. 
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the two-month-old's prior? The difficulties become even more severe if we adopt the many-worlds view 
of quantum mechanics. For then there are millions of basis states containing beings very much like you. 
Suppose we fix which one of those beings is "really" you at time t; then which one is you at times t — 1 or 
t + 17 Quantum mechanics does not fix an answer; more than that, it does not even fix the probabilities 
of possible answers.^ That is why Bohmian mechanics and its many variants can all be compatible with 
quantum mechanics, despite having different equations of motion. 

Second, the CPA begs the question of what determines the common prior. Some might argue that human 
beings' shared genetic heritage causes them (or rather, should cause them) to share a prior. But if your prior 
is to fix your initial opinions about everything, then it must assign a probability to your future experiences 
being consistent with those of (say) a five-legged extraterrestrial. Presumably that probability decreases 
dramatically once you condition on the indexical fact of your humanity. But it ought to start nonzero and 
stay nonzero, for instance because of quantum fluctuations. This raises a question: why shouldn't your 
prior equal the extraterrestrial's? After all, the extraterrestrial has to assign a probability to its future 
experiences being consistent with yours — and at a hypothetical time before either of you knows who "you" 
will become, why should the two of you reason differently? We can similarly imagine beings governed by 
different laws of physics; and these, too, should share our prior, ft follows that "the" common prior, if it 
exists, is not determined by anything in our genetic makeup or even the physical world. 

This leaves the possibility that mathematics or logic could determine the common prior. Along these 
lines, Schmidhuber 17 has advocated a prior in which the probability of any sequence of experiences x is 
proportional to 2^-^'-^\ where K is the Kolmogorov complexity of x — that is, the length of the shortest 
computer program that outputs x. This idea has several problems, though. First, our actual experiences 
seem to have gratuitously high Kolmogorov complexity. Believers in the Kolmogorov prior are forced to say, 
without evidence, that this is an illusion. Second, why should we use Kolmogorov complexity, rather than 
(say) time-bounded Kolmogorov complexity, or perhaps the length of the shortest program that outputs x 
given an oracle for the halting problem? Third, whenever we wish to compare the probabilities of a few 
"equally complex" events, the probabilities will depend less on the events themselves than on our choice of 
programming language, so we face another arbitrary choice. 

So it seems that a common prior would be independent of the physical world and even of mathematics, 
yet would somehow be readily available to and unquestioningly accepted by every rational agent. Agents 
equipped with this prior would live a 'preprogrammed' existence, meaning that they would never change, 
only conditionalize. We have argued that this picture of the world presents serious intrinsic problems, even 
setting aside its naked implausibility. So perhaps the common prior should be jettisoned with the ether. 

But is there any principled basis for prior differences, then? Consider Shakespeare's Julius Caesar, 
debating whether to venture outside on the fdes of March. From his dismissals of omens, we know that 
Caesar bases his final decision on a belief that he will not be in particular danger, rather than just a preference 
for risky actions. Yet the process of reaching the belief seems to have nothing to do with conditioning on 
evidence — or rather, it starts after the conditioning is already done. Our proposal is to view the process as 
that of Caesar choosing his prior, and thereby choosing what sort of person he is. In other words, Caesar 
assigns a low prior probability to his getting killed, for the sole reason that had he assigned a high one, he 
would no longer be Caesar but someone else.® On this view, not only can Alice and Bob have different 
priors because they are different people, but the fact that they have different priors is a large part of what 
makes them different people, rather than the same person filling two pairs of shoes. 

In saying this, we are not taking the relativist stance that any prior is "rational" for the sort of person 
who would hold that prior. If no priors are objectively more rational than others, then the word "rational" 
is meaningless, since there exists a prior to justify essentially any belief. But the question remains: is the 
number of rational priors exactly one? We have already seen an argument of Hanson jI2| that it should be, 
based on the concept of a "pre-prior" (that is, a prior over all possible priors). Why should Alice give her 
own prior any more weight than Bob's? Our response is simply to point out that there is a tremendous 

^What quantum mechanics does fix arc the probabiUties of possible outcomes of a measurement. But those probabifities 
will only be meaningful to you if you are not part of the system being measured. 

* [D] anger knows full well 
That Caesar is more dangerous than he: 
We are two lions litter'd in one day, 
And I the elder and more terrible... 
— Julius Caesar, Act 2, Scene 2 
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gap between empathizing with someone else's perspective and adopting it, or between calculating what your 
expectation would be under someone else's prior and willing that expectation to be yours. No matter how 
long she talks to Bob, in the end Alice must confront the irreducible fact of her individuality. As Clarence 
Darrow famously put it, "I don't like spinach, and I'm glad I don't, because if I liked it I'd eat it, and I just 
hate it." 
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